CORS-4334: Konnectivity#10344
Conversation
|
@patrickdillon: This pull request references CORS-4334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/wip |
|
@patrickdillon: This pull request references CORS-4334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
1d992e7 to
836e8d2
Compare
|
/test e2e-vsphere-ovn e2e-nutanix-ovn |
|
/test e2e-metal-ipi-ovn |
|
We probably want to clean up the konnectivity ports on bootstrap destroy as well. |
|
I have experimented with adding a feature gate to control this and it is possible. |
|
Need to not deploy this on a true single node cluster. |
|
Have read through the changes and the scripts all seem reasonable to me. I'll open a PR to CAPIO that switches us back to Fail webhook policy to test this with |
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
/cc @sadasu @jhixson74 |
|
/retest |
tthvo
left a comment
There was a problem hiding this comment.
This is pretty cool 😎🔥! I just have a questions and comments while learning/reading about this :D
836e8d2 to
ce99dcf
Compare
WalkthroughAdds Konnectivity bootstrap: new manifests (server, agent, namespace, egress config, secret, DaemonSet), bootstrap scripts and cert generator, bootkube integration, and cloud-provider changes to allow and clean up port 8091 ingress across providers; also updates service analysis logging for Konnectivity stages. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
In a followup I will add gating for this, but I wanted to leave it ungated for the moment to do another round of tests |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh (1)
17-22: Consider restricting private key file permissions.The generated private keys (
ca.key,server.key,agent.key) are created with default umask permissions. For defense in depth, consider explicitly setting restrictive permissions.🔒 Proposed fix to restrict key permissions
+# Ensure restrictive permissions on key files +umask 077 + # Generate self-signed Konnectivity CA openssl req -x509 -newkey rsa:2048 -nodes \Or alternatively, after each key generation:
chmod 600 "${KONNECTIVITY_CERT_DIR}"/*.key🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh` around lines 17 - 22, The script konnectivity-certs.sh generates private keys (ca.key, server.key, agent.key) using openssl without setting restrictive permissions; after each key is created (identify the openssl req/newkey invocations that write "${KONNECTIVITY_CERT_DIR}/ca.key" and the similar server/agent key writes), explicitly set file mode to 0600 for those private key files (for example by calling chmod 600 on each generated .key or on "${KONNECTIVITY_CERT_DIR}"/*.key immediately after generation) to ensure keys are not world-readable.pkg/infrastructure/aws/clusterapi/aws.go (1)
531-565: Consider renamingremoveSSHRuletoremoveBootstrapRulesfor clarity.The function now removes both SSH and Konnectivity rules, but the name and log message (line 561) still reference only SSH.
📝 Optional naming improvement
-// removeSSHRule removes the SSH rule for accessing the bootstrap node -// by removing the rule from the cluster spec and updating the object. -func removeSSHRule(ctx context.Context, cl k8sClient.Client, infraID string) error { +// removeBootstrapRules removes bootstrap-only ingress rules (SSH and Konnectivity) +// by removing them from the cluster spec and updating the object. +func removeBootstrapRules(ctx context.Context, cl k8sClient.Client, infraID string) error {And update the log message:
- logrus.Debug("Updated AWSCluster to remove bootstrap SSH rule") + logrus.Debug("Updated AWSCluster to remove bootstrap ingress rules")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 531 - 565, The function removeSSHRule should be renamed to removeBootstrapRules and its internal log message (the logrus.Debug call that currently says "Updated AWSCluster to remove bootstrap SSH rule") should be updated to reflect that both SSH and Konnectivity rules are removed; update references to awsmanifest.BootstrapSSHDescription and awsmanifest.BootstrapKonnectivityDescription remain unchanged, and ensure you also rename any callers/imports of removeSSHRule to the new removeBootstrapRules name to keep compilation intact.data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml (1)
23-54: Consider adding explicit securityContext to harden the container.While
hostNetwork: trueis required for konnectivity-agent functionality, adding explicit security controls can improve the pod's security posture. Even for bootstrap-only workloads, definingsecurityContextwithallowPrivilegeEscalation: falseis a reasonable hardening step if the application permits it.🔒 Optional security hardening
containers: - name: konnectivity-agent image: ${KONNECTIVITY_IMAGE} + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL command: - /usr/bin/proxy-agent🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml` around lines 23 - 54, Add an explicit securityContext to the konnectivity-agent container to harden it: in the PodSpec under the container named "konnectivity-agent" add a securityContext block (container-level) setting allowPrivilegeEscalation: false and, if the binary supports non-root, set runAsNonRoot: true and a non-zero runAsUser; also consider readOnlyRootFilesystem: true and dropping all capabilities (capabilities: drop: ["ALL"]) to minimize privileges while ensuring the agent still functions with hostNetwork: true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@data/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yaml`:
- Line 25: The Unix socket path for konnectivity-server is inconsistent:
konnectivity-server-pod.yaml creates the socket at
/etc/kubernetes/bootstrap-configs/konnectivity-server.socket but
egress-selector-config.yaml points to
/etc/kubernetes/config/konnectivity-server.socket; update one to match the
other. Fix by either changing the socket path in egress-selector-config.yaml to
/etc/kubernetes/bootstrap-configs/konnectivity-server.socket or altering the
konnectivity-server launch/config in konnectivity-server-pod.yaml to create the
socket under /etc/kubernetes/config/konnectivity-server.socket so both
references to konnectivity-server.socket match exactly.
In `@data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template`:
- Around line 42-61: konnectivity_manifests() currently exports
BOOTSTRAP_NODE_IP but assumes konnectivity_setup() populated it, which can lead
to an empty value in the daemonset manifest; update konnectivity_manifests() to
either re-detect and set BOOTSTRAP_NODE_IP using the same logic as
konnectivity_setup() (so envsubst produces a valid value) or validate that
BOOTSTRAP_NODE_IP is non-empty before calling envsubst and abort/record failure
with a clear error if it's unset; reference the konnectivity_manifests()
function, the BOOTSTRAP_NODE_IP variable, and the envsubst call that writes
manifests/konnectivity-agent-daemonset.yaml when making the change.
---
Nitpick comments:
In `@data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml`:
- Around line 23-54: Add an explicit securityContext to the konnectivity-agent
container to harden it: in the PodSpec under the container named
"konnectivity-agent" add a securityContext block (container-level) setting
allowPrivilegeEscalation: false and, if the binary supports non-root, set
runAsNonRoot: true and a non-zero runAsUser; also consider
readOnlyRootFilesystem: true and dropping all capabilities (capabilities: drop:
["ALL"]) to minimize privileges while ensuring the agent still functions with
hostNetwork: true.
In `@data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh`:
- Around line 17-22: The script konnectivity-certs.sh generates private keys
(ca.key, server.key, agent.key) using openssl without setting restrictive
permissions; after each key is created (identify the openssl req/newkey
invocations that write "${KONNECTIVITY_CERT_DIR}/ca.key" and the similar
server/agent key writes), explicitly set file mode to 0600 for those private key
files (for example by calling chmod 600 on each generated .key or on
"${KONNECTIVITY_CERT_DIR}"/*.key immediately after generation) to ensure keys
are not world-readable.
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 531-565: The function removeSSHRule should be renamed to
removeBootstrapRules and its internal log message (the logrus.Debug call that
currently says "Updated AWSCluster to remove bootstrap SSH rule") should be
updated to reflect that both SSH and Konnectivity rules are removed; update
references to awsmanifest.BootstrapSSHDescription and
awsmanifest.BootstrapKonnectivityDescription remain unchanged, and ensure you
also rename any callers/imports of removeSSHRule to the new removeBootstrapRules
name to keep compilation intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d7c2cba8-f144-4052-bf2f-59e7b3332489
📒 Files selected for processing (18)
data/data/bootstrap/files/opt/openshift/egress-selector-config.yamldata/data/bootstrap/files/opt/openshift/konnectivity-agent-certs-secret.yamldata/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yamldata/data/bootstrap/files/opt/openshift/konnectivity-config-override.yamldata/data/bootstrap/files/opt/openshift/konnectivity-namespace.yamldata/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yamldata/data/bootstrap/files/usr/local/bin/bootkube.sh.templatedata/data/bootstrap/files/usr/local/bin/konnectivity-certs.shdata/data/bootstrap/files/usr/local/bin/konnectivity.sh.templatepkg/asset/manifests/aws/cluster.gopkg/asset/manifests/azure/cluster.gopkg/asset/manifests/ibmcloud/securitygroups.gopkg/asset/manifests/powervs/securitygroups.gopkg/gather/service/analyze.gopkg/infrastructure/aws/clusterapi/aws.gopkg/infrastructure/azure/azure.gopkg/infrastructure/gcp/clusterapi/firewallrules.gopkg/infrastructure/openstack/preprovision/securitygroups.go
ce99dcf to
80cb9f9
Compare
|
@CodeRabbit review |
|
/retest-required |
|
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-single-node-techpreview |
|
@tthvo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f1287fa0-65c2-11f1-918e-d510a8a6bcaa-0 |
|
/test e2e-azure-ovn-techpreview e2e-aws-ovn-dualstack-ipv4-primary-techpreview e2e-aws-ovn-single-node e2e-azure-ovn-techpreview e2e-metal-single-node-live-iso |
|
/test e2e-aws-ovn-fips |
|
@tthvo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/738edb50-65c9-11f1-9131-ea27653968f2-0 |
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-5.0-multi-nightly-aws-eusc-ipi-fips-tp-arm-f7 Let's run fips to check konnectivity certs are OK 🤔 And why not custom-dns too |
|
@tthvo: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e90abbf0-65ca-11f1-8f46-61edf3ce7252-0 |
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-5.0-multi-nightly-aws-eusc-ipi-fips-tp-f28-destructive I forgot we can only run arm64 payload job on PRs :D |
|
@tthvo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e6f14970-65d8-11f1-9010-ff1b1af3885c-0 |
|
/test e2e-aws-ovn-techpreview e2e-azure-ovn-techpreview |
|
/verified by e2e-aws-ovn-techpreview |
|
@patrickdillon: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/tide refresh |
|
/override ci/prow/e2e-azure-ovn These tests failed only on a test that is currently skipped, see openshift/origin#31297 |
|
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/e2e-azure-ovn, ci/prow/e2e-openstack-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest-required |
|
/test e2e-gcp-ovn |
|
/retest-required |
|
/test e2e-gcp-ovn |
|
@patrickdillon Are there any of these optional jobs that you see as important to have passed before this PR merges? |
No. None of the failures look relevant. /skip |
|
@patrickdillon: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/override ci/prow/e2e-gcp-ovn failure looks unrelated, caught in retest hell |
|
@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-gcp-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Continuation of #10280:
Will break the API vendoring into a separate PR to get that merged sooner rather than later.
Not tested. Opening this now as a /WIP to continue discussion of #10280 with #9628
/cc @JoelSpeed @mdbooth